Porting IDL to Python

In [1]:
import numpy as np
import idlwrap

Introduction

With numpy and scipy, there are powerful and open-source tools available for scientific computing in python. Currently, still lots of scientific projects — especially in astrophysics — rely on the proprietary and expensive IDL programming language instead of moving foward to open and reproducible science. This guide aims to help in porting an IDL codebase to python, while taking full advantage of its powers.

For help with porting specific IDL functions and routines you are invited to look at the source code of idlwrap, which has porting instructions in its docstrings.

reading this guide

This guide contains code examples in both IDL and python. IDL code blocks are prefixed with IDL>, whereas python code starts with >>>. Also, IDL functions and routines are represented in uppercase.

Rounding

technical background

In computer hardware, floating-point numbers are represent as binary fractions. This binary approximation can cause confusion --- e.g. in the well-known example:

>>> 0.1 + 0.1 + 0.1 == 0.3
False

The floating-point value 0.1 is not stored as exactly 0.1 in memory, but rather as 3602879701896397 / 2 ** 55, which is approximatively 0.1000000000000000055511151231257827021181583404541015625.... These differences add together and lead to the unusual result.

rounding

In IDL, ROUND uses round-half-away-from-zero, also known as commercial rounding. That's what you usually learn in school. It treats positive and negative values symmetrically: If positive and negative numbers are equally probable, this rounding is free of any bias.

IDL> PRINT, ROUND(-0.5), ROUND(0.5), ROUND(1.5), ROUND(2.5)
       -1           1           2           3

python / numpy use half-to-even / financial rounding / mathematical rounding, which is the default rounding mode in the IEEE-754 standard. On machines, which represent floating-point numbers using binary approximation, this rounding is non-biased, whereas round half away from zero (like IDL's ROUND), would be positively biased.

>>> round(-0.5), round(0.5), round(1.5), round(2.5)
(0, 0, 2, 2)

numpy's numpy.around function and the ndarray.round method round as python's built-in round.

porting

In general, you don't have to bother which rounding method your program uses. But if you use ROUND when e.g. determining list indices, this could cause differences. Use idlwrap.round in that cases, which implements IDL's round-half-away-from-zero rounding.

Precision

Floating point numbers are stored internally with a fixed number of bits, or precision. The IEEE Standard for Binary Floating-Point for Arithmetic (IEEE-754) defines

  • double precision. python default, used in float / np.float64. IDL DOUBLE. Contains 53bits of precision.
  • single precision. IDL default, called FLOAT. If you really really need to, use np.float32
  • half precision. listed for completeness. Corresponds to np.float16.

IDL often has multiple functions for the different data types, e.g. FINDGEN (FLOAT, 32 bit) and DINDGEN (DOUBLE, 64 bit), or !PI (32 bit) and !DPI (double, 54 bit), while most of numpy's functions accept a dtype=... argument.

You usually do not need to think about bits in python, just use e.g. np.zeros(...) for both FLTARR(...) and DBLARR(...).

Note: INTARR(...) could be replaced by np.zeros(..., dtype=int)

Arrays

memory order

general

There are two different ways of storing a matrix/array in memory:

  • column-major. The matrix is stored by columns, so the first index is the most rapidly varying index when moving through the elements of the array
    • the first index moves to the next row as it changes
    • e.g. FORTRAN, IDL
    • access element by [column, row], upper-left element is [0,0]
  • row-major. The first index is the row.
    • last index changes most rapidly as one moves through the array as stored in memory
    • e.g. C, Visual Basic, python
    • access element by [row, column]

further reading:

Example 1

Let's look at an example:

IDL> PRINT, FLTARR(2, 4) ; 2 columns
     0.00000      0.00000
     0.00000      0.00000
     0.00000      0.00000
     0.00000      0.00000
>>> np.zeros((2,4)) # 4 columns
    array([[0., 0., 0., 0.],
           [0., 0., 0., 0.]])

In IDL, the first diemsion is the number of columns, the second the number of rows. You index them the same way, [column, row] --- to get the bottom right element:

IDL> PRINT, (FLTARR(2, 4))[1,3]
     0.00000

In Python, the first dimension is the number of rows. Indexing works like [row, column], so the bottom right element is

>>> np.zeros((2,4))[1,3]
    0.0

Did you notice how the subset-indices are the same for both IDL and python in this case, even if we chose a different element?

Example 2
IDL> a = [[1,2,3,4], [5,6,7,8]]
IDL> a
     1       2       3       4
     5       6       7       8
IDL> SIZE(a)
     2           4           2           2           8
;    n_dimensions, rows,     columns,    ...
IDL> a[3, 0]
     4
>>> a = np.array([[1,2,3,4], [5,6,7,8]])
>>> a
    array([[1, 2, 3, 4],
           [5, 6, 7, 8]])
>>> a.shape
    (2, 4) # (rows, columns)
>>> a[0, 3] # inverse order compared to IDL!
    4

array index ranges

In IDL, the index ranges are inclusive (they include the endpoint):

IDL> (FLTARR(10))[3:5]
     0.00000      0.00000      0.00000 ; -> three elements

While in python, the endpoint is not included:

>>> np.zeros(10)[3:5]
    array([0., 0.]) # -> two elements

This is also the case for the FOR statement.

idlwrap provides two ways around this. The first one would be to use the subset_ function:

>>> a = np.zeros(10)
>>> idlwrap.subset_(a, "[3:5]")
    array([0., 0., 0.])

The second way would be to wrap the array inside subsetify_. The resulting object (b) is like a numpy array, but behaves differently when a string is passed as subset:

>>> a = np.zeros(10)
>>> b = idlwrap.subsetify_(a) # b is like a numpy array...
>>> b[3:5] # python behaviour
    array([0., 0.])
>>> b["3:5"] # IDL behaviour: pass indices as string
    array([0., 0., 0.])

float indices

IDL automatically floors array indices, so a[1] and a[1.9] lead to the same result:

IDL> a = INDGEN(3)
IDL> a
       0       1       2
IDL> a[1]
       1
IDL> a[1.9]
       1

In python, you'll have to int indices, or numpy with throw an IndexError.

FOR statement

In IDL, the endpoint of the FOR statement is also included in the loop, while python's range excludes the endpoint.

Example 1: integer ranges
IDL> FOR i=4, 6 DO PRINT, i 
     4
     5
     6 ; -> 3 elements
>>> for i in range(4, 6):
>>>     print(i)
    4
    5 # 2 elements

A common way of dealing with the endpoint in python is to explicitely increment it:

>>> for i in range(4, 6+1):
>>>     print(i)
    4
    5
    6
Example 2: float ranges
IDL> FOR i=3.5, 4.5 DO PRINT, i
     3.50000
     4.50000

While python's built-in range only supports integer arguments, numpy's arange also allows floats:

>>> for i in np.arange(3.5, 4.5+1):
>>>     print(i)
3.5
4.5
Example 3: endpoint not reached
IDL> FOR i=3.5, 5 DO PRINT, i
     3.50000
     4.50000

Adding an explicit +1 to range/np.arange would add another unwanted element to the iteration:

>>> for i in np.arange(3.5, 5+1):
>>>     print(i)
3.5
4.5
5.5

An alternative approach would be to add a very small offset, e.g. 1e-12 to the endpoint, which leads to the expected result:

>>> for i in np.arange(3.5, 5+1e-12):
>>>     print(i)
3.5
4.5

idlwrap's idlwrap.range_ uses 1e-12 as an offset.

Example 4: float ranges and array indices

IDL automatically transforms array indices to integers, so this is perfectly valid:

IDL> a = INDGEN(6)
IDL> for i=0.0, 5, 0.7 DO print, i, a[i]
      0.00000       0
     0.700000       0
      1.40000       1
      2.10000       2
      2.80000       2
      3.50000       3
      4.20000       4
      4.90000       4

In python, you'll have to int the indices explicitely: a[int(i)].

warning: the following code:

FOR i=0, 5, 0.7 DO print, a[i]

would lead to an infinite loop printing 0! The difference is the i=0 (integer type) instead of i=0.0 (float).

Matrix multiplication

IDL provides two matrix multiplication operators, # and ##:

IDL> a = indgen(2, 3)
IDL> a
     0       1
     2       3
     4       5   
IDL> b = indgen(3, 2)
IDL> b
     0       1       2
     3       4       5  
IDL> a # b
     10          13
     28          40
IDL> a ## b
      3           4           5
      9          14          19
     15          24          33
>>> a = np.arange(2*3).reshape((3, 2))
>>> a
    array([[0, 1],
           [2, 3],
           [4, 5]])
>>> b = np.arange(3*2).reshape((2, 3))
>>> b
    array([[0, 1, 2],
           [3, 4, 5]])

python 3.5+ has a new matrix multiplication operator @, which behaves like IDL's ##:

>>> a @ b
    array([[ 3,  4,  5],
           [ 9, 14, 19],
           [15, 24, 33]])

@ is an alias for np.matmul, the latter also being available in older python/numpy versions.

To replicate the # operator, one would have to use .T to transpose the input and output:

>>> (a.T @ b.T).T
    array([[10, 13],
           [28, 40]])
Fork me on GitHub